智能论文笔记

Scalable and Efficient Neural Speech Coding: A Hybrid Design

Kai Zhen , Jongmo Sung , Mi Suk Lee , Seungkwon Beak , Minje Kim

分类：机器学习

2021-03-27

我们提出了一种可扩展高效的神经波形编码系统，用于语音压缩。我们将语音编码问题作为一种自动汇总任务，其中卷积神经网络（CNN）在其前馈例程期间执行编码和解码作为神经波形编解码器（NWC）。所提出的NWC还将量化和熵编码定义为可培训模块，因此在优化过程期间处理编码伪像和比特率控制。通过将紧凑的模型组件引入NWC，如Gated Reseal Networks和深度可分离卷积，我们实现了效率。此外，所提出的模型具有可扩展的架构，跨模块残差学习（CMRL），以覆盖各种比特率。为此，我们采用残余编码概念来连接多个NWC自动汇总模块，其中每个NWC模块执行残差编码以恢复其上一模块已创建的任何重建损失。 CMRL也可以缩小以覆盖下比特率，因为它采用线性预测编码（LPC）模块作为其第一自动化器。混合设计通过将LPC的量化作为可分散的过程重新定义LPC和NWC集成，使系统培训端到端的方式。所提出的系统的解码器在低至中等比特率范围（12至20kbps）或高比特率（32kbps）中的两个NWC中的一个NWC（0.12百万个参数）。尽管解码复杂性尚不低于传统语音编解码器的复杂性，但是从其他神经语音编码器（例如基于WVENET的声码器）显着降低。对于宽带语音编码质量，我们的系统对AMR-WB的性能相当或卓越的性能，并在低和中等比特率下的速度试验话题上的表现。所提出的系统可以扩展到更高的比特率以实现近透明性能。

translated by 谷歌翻译

Cartoonization is a task that renders natural photos into cartoon styles. Previous deep cartoonization methods only have focused on end-to-end translation, which may hinder editability. Instead, we propose a novel solution with editing features of texture and color based on the cartoon creation process. To do that, we design a model architecture to have separate decoders, texture and color, to decouple these attributes. In the texture decoder, we propose a texture controller, which enables a user to control stroke style and abstraction to generate diverse cartoon textures. We also introduce an HSV color augmentation to induce the networks to generate diverse and controllable color translation. To the best of our knowledge, our work is the first deep approach to control the cartoonization at inference while showing profound quality improvement over to baselines.

translated by 谷歌翻译